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A. Discussion 25 _— 


ow does the good work-group leader 
differ from the man who fails as a 
leader? Are there basic personality fac- 
tors that characterize the successful leader 
that are not found in the unsuccessful 
leader? The current literature bulges 
with answers to these questions; but most 
of the answers are merely opinions. Very 
little attempt has been made to define 
operationally and to quantify the traits 
or characteristics that are related to 
quality of leadership. It is the purpose 
of this study to contribute some objective 
data which will provide at least a partial 
answer to these questions. 

Specifically, we shall attempt to de- 
termine whether there. is a relationship 
between a person’s ability as a leader of 
others in a work-group situation and 
measures of certain personality charac- 
teristics which many authorities on 
leadership believe to be essential to suc- 
cess. Also, we shall attempt to determine 
whether there is a relationship between 
work-group leadership ability and a per- 
son’s interests, background, or personal 
history. 

In the attempt to measure characteristics re- 
lated to quality of leadership, several problems 


must be dealt with. The first of these is the 
question of definition. Many definitions of the 


* This monograph is a revision of a disserta- 
tion submitted in partial fulfillment of the 
requirements for the degree of Doctor of Phi- 
losophy at the University of Michigan. During 
the period of this study the author was employed 
by The Detroit Edison Company. Subsequent to 
its completion, he joined the staff of The Psycho- 
logical Corporation in New York. 


CHAPTER 
INTRODUCTION* 


“good” leader have been presented in the litera- 
ture, not all of which are in agreement. To 
present another conceptual definition of good 
leadership seems to be futile. An operational 
definition is needed. Therefore, we shall attempt 
to establish a criterion measure of leadership 
ability which will not only suffice to define the 
quality, but also serve as an operational stand- 
ard against which to evaluate measures. 

A second factor which has made the attempt to 
study characteristics of the successful leader diffi- 
cult is the evidence that leadership is a situation- 
ally determined phenomenon. The social psy- 
chologists, especially, insist that the study of 
leadership calls for the situational approach 
(2, 5, 6, 9, 20). The literature presents evidence 
to support this contention (1, 9, 16). It seems wise 
to use these findings as an operational starting 
point. Therefore, the scope of this study will be 
limited to leaders of work groups only; and it 
will be limited further to ability to handle the 
“human relations” aspect of work-group leader- 
ship. 

However, it still might be claimed that the 
personal characteristics necessary for dealing suc- 
cessfully with the human relations problems of 
supervision may differ from one group to an- 
other, depending upon the characteristics of the 
members of the groups, the type of work the 
groups perform, the needs of and outside pres- 
sures on the members of the groups, and other 
similar factors. This hypothesis will be tested in 
the present study, as far as is practicably pos- 
sible, by analyzing separately the differences be- 
tween good and poor leaders of different types 
of work groups. 

A third major difficulty which has been en- 
countered in the study of leaders is the fact 
that persons who possess very different person- 
ality characteristics may be successful in the 
same leadership situation (10, 16). To the extent 
that this is true, the results of this study will be 
negative. However, it seems reasonable to hy- 
pothesize that, while successful leaders may differ 
in many traits, some common characteristics of 
the personality make-up of individuals may be 
found which are related to the way in which 
they react to others as leaders in work-group 
situations. 
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CHAPTER II 


PROCEDURE FOR CONDUCTING THE STUDY 


A. GENERAL QUESTIONS TO WHICH 
ANSWERS ARE SOUGHT 


REVIEW of the literature on leader- 
A ship reveals one very striking fact. 
The qualities found important in the 
empirical studies, especially those dealing 
with the selection of supervisors and 
executives, are not the qualities given 
most emphasis by the writers who are 
considered to be authorities on super- 
vision and leadership, This discrepancy 
is particularly conspicuous with regard 
to certain social attitudes. Almost in- 
variably the authorities on leadership 
stress such qualities as ability to see the 
other person’s point of view, sensitive- 
ness to human traits and reactions, an 
interest in people, human understand- 
ing, and the like. However, about the 
only measure related to these qualities 
that comes out in empirical studies is 
“social service” interest as measured by 
the Kuder Preference Record. And this is 
sometimes found to correlate negatively 
with success as a leader, depending upon 
the criterion of leadership success used 
(15, 17)- / 

Do good and poor work-group leaders 
really differ in some of these character- 
istics that authorities consider impor- 
tant, but that are seldom, if ever, meas- 
ured? If any of these differences actually 
exist, can they be measured objectively? 
Do these differences exist only among 
supervisors of some types of work-groups, 
supervisors of office workers, for ex- 
ample, but not among supervisors of 
other types of work-groups, supervisors of 
manual workers, for example? How do 
the interests of good leaders differ from 
those of poor leaders? Do good and poor 


work-group leaders differ in their back- 
grounds? This study was undertaken be- 
cause it was felt that the answers to some 
of these questions were either lacking or 
not very clear in the literature. 


B. EXPERIMENTAL DESIGN 


The study was carried out in a large 
utility company as a first step in a long- 
term project designed to improve the 
process of selecting new supervisors. The 
procedure for conducting the study was, 
basically, very simple. Several tests were 
administered to, and other quantifiable 
data were obtained from, a number of 
first-line supervisors. In addition, detailed 
ratings of each supervisor’s ability as a 
leader were obtained from the super- 
visors’ superiors. The relationships be- 
tween these measures were determined 
by computing Pearson product-moment 
correlation coefficients and by determin- 
ing the significance of differences be- 
tween extreme groups on the criterion 
measure. 


1. The Sample 


The first-line supervisors in the four 
major departments of the Company were 
chosen for study. The departments in- 
cluded were Production, Overhead Lines, 
Sales, and Accounting. All of the first- 
line supervisors in the Production De- 
partment, all in three representative dis- 
tricts of the Overhead Lines Department, 
and all in two representative districts of 
the Sales Department were tested except 
for those on vacation or out because of 
sickness at the time of the study. How- 
ever, in the Accounting Department, at 
the request of the department head, only 
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those who volunteered to participate in 
the study could be tested, Participation 
of the supervisors in the other depart- 
ments was also on a voluntary basis, but 
the voluntary nature of the project was 
explained at the testing sessions. None 
of those to whom the project was ex- 
plained declined participation in the 
study. However, in the Accounting De- 
partment the voluntary nature of the 
project was explained ahead of time by 
staff persons in the department rather 
than by the investigators. Only 65 per 
cent volunteered. Many who did not par- 
ticipate “would have liked to,” but they 
were “just too busy.” Fortunately, ratings 
could be obtained on all the supervisors 
in this department. 


As one might expect, those who had volun- 
teered were rated higher as a group than those 
who did not. However, the difference was smaller 
than might be expected. The mean rating of the 
group tested fell at the 57th centile on the dis- 
tribution of the ratings of all supervisors in the 
department. The difference between the mean 
rating of those tested and the mean rating of 
those not tested is significant at the six per cent 
level of confidence.* 


Test scores and ratings were obtained 
on a total of 226 supervisors, However, 
36 were eliminated for the reasons enu- 
merated below, leaving a total of 190 
cases in the final sample. 


Several of those eliminated had been included 
in the testing for “diplomatic” reasons, but the 
data obtained from them could not be included 
in the analysis for two reasons. First, nine were 
eliminated because they supervised only one 
other person. For this study leaders of work 
groups were defined as those persons, classified 
by the Company as supervisors, who direct and 
are responsible for the work of at least two 
other persons. Secondly, ten of the supervisors 


*The distribution of ratings for those not 
tested was skewed slightly to the left. The mean 
rating for this group was 46.04, and the median 
rating was 49. The distributions of ratings for 
those who did and those who did not take the 
tests were pretty well matched except at the 
upper ends. All of the supervisors who were 
rated very high participated in the study. 
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tested were women. It was decided to eliminate 
data obtained from them because the number 
was too small to analyze with reasonable ac- 
curacy possible differences between them and the 
men on the measures obtained. For the other 17 
cases eliminated from the final sample, it was 
impossible to obtain a reliable criterion measure. 

The sample was divided into several 
different subgroupings for analyzing the 
data. This is illustrated graphically in 
Figure 1. In the first place, the data 
were analyzed separately by departments. 
Secondly, the sample was divided into 
two parts on the basis of the type of 
work-groups the subjects supervised. 
These were (a) supervisors of office work- 
ers and (b) supervisors of manual work- 
ers (those classified by the Company as 
Trades and Operating groups). The first 
category included 77 of the supervisors, 
and the second included 113. In the 
third place, the total sample was split 
into two sub-samples, called Sample A 
and Sample B, for item analyzing and 
cross-validating the data obtained on cer- 
tain measures, Two-thirds of the cases 
were included in Sample A, and the re- 
mainder in Sample B. Samples A and B 
were selected to be representative of the 
whole sample as far as department and 
type of work-group led were concerned, 
but randomly selected within those 
groups. Sample A (the two thirds) was 
used for purposes of item analysis of cer- 
tain of the tests, and Sample B was used 
for cross-validating those tests which had 
been revised on the basis of item analysis, 
This procedure is described in detail in 
Chapter V. 

The average age of the supervisors in 
the sample was 45.73 years. The size of 
their work-groups ranged from 2 to 71, 
with a median value of 9.1. The average 
length of time they had served as a su- 
pervisor was 7.87 years. 

In addition to the sample described 
above, a number of supervisors and su- 
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TOTAL SAMPLE DIVIDED BY DEPARTMENTS 


Department 


TOTAL SAMPLE DIVIDED BY 
TYPES OF GROUPS SUPERVISED 


Trades and Office Employees 
Operating Employees 


Supervisors of » Supervisors of 


TOTAL SAMPLE DIVIDED INTO TWO REPRESENTATIVE 
RANDOM SAMPLES FOR VALIDATING MEASURES 


Somple_A » 


( For item analyzing ( For 
measures to develop — empirical scoring keys 
empirical scoring keys ) Z developed on Sample A) 


Fic. 1. Different ways in which the total sample was divided for analysis | 
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pervisory applicants employed by the 
City of Detroit were used for pre-testing 
the measures and experimental editions 
thereof. 


2. The Measures 


A rating scale of the check-list kind 
was developed to serve as the criterion 
measure for validating tests and question- 
naires. In this manner it was possible to 
obtain a detailed description, from which 
an objective score could be obtained, of 
the way in which each supervisor handled 
the human relations aspect of his job. 
The immediate superiors of each super- 
visor tested were asked to supply the 
ratings. In each case as many persons as 
possible, who were familiar with the 
first-line supervisor’s work, were asked to 
furnish ratings. At least two ratings were 
obtained for each supervisor in the 
sample. 

Three tests were developed to measure 
differences in the social attitudes of the 


_ supervisors. The longest of the three, and 


the one on which the most time and 
effort was spent in development, became 
in its final form a multiple-choice pro- 
jective test. This was called the Social 
Judgment Test. In the other two tests 
forced-choice techniques were used. 
These were named Supervisor's Opinion- 
naire and Description of Supervisors. 
Also, an inventory (called the Personal 
Data Inventory) was developed to deter- 
mine differences in the interests and 
backgrounds of good and poor super- 
visors, 

In addition, the Wonderlic Personnel 
Test (a test which purports to measure 
general problem-solving ability) and the 
Word Fluency test of the Chicago Test 
of Primary Mental Abilities battery were 
administered as control measures. Also, 
the age and length of service as a super- 
visor of each member of the sample were 
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obtained from Company personnel rec- 
ords. 
All of the variables investigated may be pre- 
sented more explicitly in a list as follows: 
. Social Judgment Test (the projective test) 
. Supervisor’s Opinionnaire (forced-choice) 
. Description of Supervisors (forced-choice) 
. Personal Data Inventory (for obtaining 
interests and background information) 
. Personnel Test (general mental ability) 
. Word Fluency Test 


‘ oer of experience as a supervisor 

. Ratings of on-the-job performance as a 
supervisor 

3. Administration of the Measures 

The tests were administered to the 
supervisors in groups of from 2 to 24. 
Usually from 20 to 30 minutes were 
spent at the beginning of each testing 
session to explain very thoroughly the 
nature and purpose of the study and to 
gain the full cooperation of the partici- 
pants. The tests were then administered. 
It usually took the supervisors from one 
and one-half to two hours to complete 
them. Only the tests used for controls 
were administered with a time-limit, but 
the testees were encouraged to work rap- 
idly on the others. The supervisors were 
assured that the data collected for this 
study would be available to no one out- 
side the Industrial Psychology Division 
of the Company. 

The Personal Data Inventory was not 
administered at the testing sessions. Each 
testee was provided with a stamped, ad- 
dressed envelope, asked to fill out the 


OO ND 


AN 


form on his own time, and to return it 


through the mail. 

The ratings were obtained in separate 
meetings with the supervisors’ superiors 
in each department, division, or work 
location, Here, again, the study was ex- 
plained very thoroughly before the rat- 
ings were obtained, and the confidential 
nature of the data secured was empha- 
sized, 
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CHAPTER III 


THE CRITERION OF SUCCESSFUL LEADERSHIP 


HE crux of a study such as this one 

lies in the criterion against which 
the measures developed are validated. 
Therefore, a great deal of time and effort 
was spent in attempting to develop a 
valid measure of the criterion, leadership 
quality. Many difficulties were encoun- 
tered in the process. 


In the first place, it proved to be very difficult 
to obtain a comparable measure of leadership 
ability for each member of the sample. No ob- 
jective measure was available for this. Therefore, 
it was necessary to depend upon the subjective 
judgments of those with whom the leaders came 
into contact in their work. For this study it 
probably would have been well to obtain the 
judgments of the members of the groups led. 
However, this was not feasible for practical 
reasons. Therefore, it was necessary to depend 
upon the judgments of the supervisors’ superiors. 
Obtaining a measure from this group which 
would be strictly comparable for each super- 
visor in the sample was almost impossible, since 
each judge could compare the supervisor rated 
only to others whose positions were similar. That 
is, it was very difficult, if not impossible, for one 
rater to compare the ability of supervisors in 
different departments, or to compare supervisors 
of manual workers with supervisors of clerical 
workers in the same department. 

In order to obviate this difficulty, at least to 
some degree, it was decided to attempt to ob- 
tain a detailed description of the manner in 
which each supervisor performed his job. The 
weighted check-list type of rating scale seemed to 
meet this requirement most adequately. This 
type of scale also serves to reduce the variability 
of the raters’ individual interpretations of 
“good” leadership. 


A. CONSTRUCTION OF THE RATING SCALE 


1. The Compilation of Items 


As the first step in the development of 
the rating scale, a number of statements 
were compiled, each describing the be- 
havior of a supervisor in performing his 
duties.1 The statements were composed 


‘For example, statements referring to a super- 
visor’s performance in the Human Relations 


from ideas presented in textbooks on 
leadership, leadership training materials, 
and from other rating scales of leadership 
quality.” 


The statements were carefully edited, then 
classified into the following three major groups, 
depending upon the area of supervision to 
which they referred: (a) Job Knowledge—how 
well the supervisor knows or how well he can do 
the jobs performed by the workers under him; 
(b) Administrative—how effectively the super- 
visor plans his work, organizes and assigns tasks 
for subordinates, takes care of records, etc.; and 
(c) Human Relations—how well the supervisor 
handles the personnel problems of supervision— 
how successful he is in gaining the respect and 
enlisting the cooperation of his subordinates. 

For this particular study a measure of super- 
visory ability in only the third area, Human Re- 
lations, was desired. However, it was decided to 
include the other two areas in the rating scale 
for several reasons. First, it was anticipated that 
many raters, presented with a rating scale which 
included only the human relations aspect of 
supervision, would protest that there is more to 
supervision than ability to handle people. Sec- 
ondly, it was thought that the inclusion of the 
ratings in other areas might help in evaluating 
the validity of the ratings in human relations. 
For example, since the three areas are generally 
considered to be relatively independent, the de- 
gree to which the rater’s general impression of a 
man affects his rating of human relations ability 
can be evaluated, at least to some extent. In the 
third place, it was felt that by requiring the 
rater to consider the ability of each supervisor 
in several areas, the rating in human relations 
might be less influenced by his general impres- 
sion of the man. That is, it would not only have 
the effect of emphasizing the fact that ability in 


area of supervision, such as the following, were 
used: 

He sometimes argues with his men. 

He is more likely to ask a man to do a job 

than to order him to do it. 

He is a little hesitant about giving a worker 

credit when credit is due. 

He uses praise more often then censure. 

He takes a personal interest in the welfare of 

each of his workers. 

He over-supervises—allows his workers little 

opportunity to show initiative. 

* Some of the statements were obtained from a 
check-list rating scale being developed at Purdue 
University. These items were reproduced by 
special arrangement with Professor C. H. Lawshe. 
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human relations might be independent of ability 
in other areas of supervision, but also it would 
make it easier for some raters to indicate a 
supervisor’s weakness in this area if he could 
also show that the supervisor was relatively 
strong in another area. This might be especially 
true in the case of the supervisor who is con- 
sidered to be a fairly good man in all-around 
ability, possibly because of his ability as an 
administrator, but who is weak in the human 
relations area. 


2. Determining Weights for the Items 


Twenty statements in the Job Knowl- 
edge area, 65, statements in the Adminis- 
trative area, and 75 statements in the 
Human Relations area were selected, 
placed on cards (one statement to a 
card), and arranged in packs for sorting. 
Approximately fifty people in the Com- 
pany in which this study was carried out 
were asked to sort the statements. The 
conventional procedures of the method 
of equal appearing intervals were em- 
ployed. The cards were sorted into seven 
piles, representing degrees of supervisory 
ability as indicated by the statements. 
For example, those statements which de- 
scribed a very high degree of ability; 
possessed by only those few supervisors 
who were outstanding in the area being 
considered, were to be placed in pile 
number seven. The statements in only 
one area were sorted at one time by a 
sorter. That is, for example, he compared 
statements in the Human Relations area 
only. 

The group of persons who sorted the state- 
ments consisted for the most part of persons 
doing personnel work. A few were employment 
interviewers, several dealt with personal prob- 
lems of workers, several were responsible for the 
supervisory training program, a few of them 
were personnel coordinators for the different de- 
partments in the Company, and the like. Many 
of the sorters were supervisors themselves. 

Most of the sorters or judges evaluated the 
statements in only one of the three areas. A few 
evaluated statements in more than one area, but 


evaluated those in each area separately. For each 
of the three areas, 20 different people sorted 
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the items. This number was determined largely 
by practical considerations, but also in part by 
considering the magnitude of the standard error 
of a median scale value derived from this num- 
ber of judges.* 


The median pile number into which 
each item was placed by the judges was 
computed as the scale value or weight 
for that item. The quartile deviation 
(Q) value for the distribution of each 
item was computed as a measure of dis- 
persion. 


3. Evaluation of the Scale Values 
Determined for the Items 


The scale values of the items used in 
the Human Relations part of the rating 
scale constitute the definition of leader- 
ship quality as it is used in this study. 
Recognizing this, the question now arose 
as to what extent the scale values derived 
from the judgments made by the group 
of people in the Company would differ 
from those that would be derived from 
another group. In order to test this, at 
least to some extent, a group of students 
from several classes in Personnel Meth- 
ods at Wayne University were asked to 
sort the statements. It was felt that the 
members of this group would be more 
familiar with the experts’ conception of 
leadership, as expressed in books and ar- 
ticles on supervision and leadership, than 
would the judges working in the Com- 
pany. 


Since students are relatively plentiful and easy 
to obtain for studies such as this, it was decided 
to determine also the extent to which the scale 
values and Q values derived from the process of 
sorting the statements on cards by the method 
of equal appearing intervals would differ from 
those obtained by merely rating the statements 
presented in a list. This knowledge might prove 
to be valuable for future projects of this kind, 
since the rating scales are much simpler and less 


* The average S.E..cay, for the items selected for 
the final edition of the scale was .23. A difference 
of .64 in the scale values of two items is signifi- 
cant at the 5 per cent level of confidence. 
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expensive to prepare than are the materials for 
sorting the items. Furthermore, Seashore and 
Hevner (12) present evidence to show that the 
judges find rating easier, faster, and more con- 
venient than sorting. 

Each student was presented with an envelope 
containing a pack of cards on which were printed 
the behavior statements in either the Human 
Relations or Administrative areas, and a rating 
scale with the statements in the other of those 
two areas. (The statements in the Job Knowledge 
area were not included in this part of the study.) 
Thus no student sorted and rated the same state- 
ments, but sorted statements in one area and 
rated statements in the other. A total of 80 
students participated in this project. An equal 
number of the envelopes or kits of each type 
were distributed randomly, so that for each of 
the two areas of supervision considered and for 
each of the two methods of evaluating the state- 
ments, the judgments of 40 students were ob- 
tained. 

The scale values and Q values for the items 
were determined as they were for the Company 
group of judges. In general, it was found that 
the scale values and Q values obtained for the 
items, as evaluated by the two different groups 
of judges and by the two different techniques 
for judging, did not differ markedly. The corre- 
lation between the scale values derived from the 
two different techniques for judging the items 
(sorting the items by the method of equal ap- 
pearing intervals vs. rating the items in a list) 
was .985. The correlation between the scale values 
derived from the two different groups of judges 
(students vs. the Company group of judges, both 
sorting the items by the method of equal ap- 
pearing intervals) was .g76. The mean differences 
between the scale values and Q values obtained 
by the different judges and methods were not 
found to be statistically significant. However, 
the differences on a few individual items were 
significant.* 

As a further check on possible differences in 
results from the sorting and rating processes, the 
standard deviations of the scale values derived 
from each of these methods were computed. It 
was felt that the judges might tend to place 
more of the items in the extreme categories (1 
and 7) when sorting than when rating. However, 
for the statements in the Administrative area 
the standard deviations of the scale values were 
found to be exactly the same (1.82) for both 


*One would expect to find significant differ- 
ences on a few items by chance. That is, for 
example, it would be expected that differences 
on 5 items out of 100 would be found to be 
statistically significant at the 5 per cent level 
of confidence by chance. However, in this study 
a few more significant differences than would 
be expected were found. 
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methods. For the statements in the Human Re- 
lations area the standard deviations differed 
slightly in the expected direction (1.81 for rating 
and 1.94 for sorting), but this difference was not 
statistically significant. 

Thus, it appears that for all practical 
purposes the results obtained by sorting 
the items by the method of equal appear- 
ing intervals are not essentially different 
from those obtained by merely rating the 
items. This agrees with results found by 
Seashore and Hevner in their study to 
determine scale values of items for an 
attitude scale (12). 

The scale values derived from the 
judgments of the student and Company 
groups also differed very little. However, 
while the fact that the different tech- 
niques for judging items gave practically 
the same results has value for its general 
applicability, the fact that the two groups 
differed little in their judgments is of 
little general value. In constructing a 
rating scale of this kind for evaluating 
performance in a particular job, it is 
essential that the scale values for the 
items be derived from a group of persons 
qualified to judge performance in that 
job. The comparison of the judgments of 
the two groups served an important pur- 
pose in this study. The members of both 
groups had had enough expérience and 
training to evaluate leadership behavior. 
Since the training and experience of the 
two groups had been different, it was de- 
sirable to determine to what degree their 
conception of “good” leadership differed. 
It was found, for example, that the dif- 
ferences found on a few individual items 
seemed to be due to a greater tendency 
for the students to place items containing 

absolute terms, such as “always,” “never,” 
“very much,” and the like, in the ex- 
treme categories. This might possibly 
have been due to their inexperience and 
consequent lack of knowledge of the true 
importance of some of the activities to 
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which the statements containing the ab- 
solutes referred. 


4. Constructing the Final Edition of 
the Rating Scale 


Ten items in the Job Knowledge area, 
25 items in the Administrative area, and 
4o items in the Human Relations area 
were selected for the final edition of the 
scale. These were selected on the basis of 
(a) their scale values, in order to include 
items all along the scale, and (b) their Q 
values, to eliminate the more ambiguous 
items. No item with a Q value greater 
than .g in scale units was selected. The 
mean Q values for the items selected 
were .58 for those in the Job Knowledge 
area, .56 for those in the Administrative 
area, and .59 for those in the Human 
Relations area, 

In the final edition of the rating scale 
the items for each area were listed to- 
gether in random order in separate parts 
of the rating scale. At the end of the 
list of items in each part a simple, nu- 
merical rating continuum was provided, 
in order to give the rater an opportunity 
to express his general, over-all opinion 
of the ratee’s ability in that area of su- 
pervision. This was included for two 
major reasons. First, on the basis of past 
experience with the use of rating scales, 
it was anticipated that some raters would 
be disturbed because of the difficulty in 
determining exactly where they were 
placing a ratee on the scale by checking 
the items alone. Secondly, it was hoped 
that the rating might provide some valu- 
able additional information for evaluat- 
ing the leadership ability of each ratee. 

Three categories were provided for the 
rater to indicate the degree to which each 
statement described the supervisor being 
rated. The rater indicated that each 
statement either (a) described fairly well 
the ratee, (b) partially applied as a de- 


scription of the ratee, or (c) did not de- 
scribe the ratee or did not apply in the 
particular supervisory situation. 

In addition to filling out the detailed 
rating form, each rater was asked to indi- 
cate his opinion as to the ratee’s general 
over-all ability as a supervisor. This was 
obtained on a separate form which pre- 
sented a simple numerical scale. 


B. ADMINISTRATION OF THE RATING SCALE 


The ratings were obtained from the 
supervisors at higher levels to whom the 
first-line supervisors reported. 

It was attempted to obtain at least two ratings 
of each supervisor in the sample with the de- 
tailed rating form. In a few cases this was not 
possible, since in some situations only one per- 
son was in a position to know enough about 
the ratee’s supervisory practices to answer the 
items in the form. For every member of the 
sample, it was attempted to get ratings from as 
many persons as possible, who were familiar 
enough with the work of the supervisor to fill 
out the rating form reliably. In some cases, as 
many as seven persons could provide a detailed 
rating of one supervisor. 


In addition to the ratings obtained 
with the weighted check-list scale, all ot 
the supervisors in the sample were 
ranked in ability to handle human rela- 
tions problems, by those executives and 
staff persons in each department who 
were in a position to be able to evaluate 
and compare the ability of at least eight 
of the supervisors participating in the 
study. 

Many of the persons who ranked su- 


pervisors had also rated some or all of 


the same supervisors with the detailed 
rating form. For those persons who were 
familiar enough with the work of a num- 
ber of supervisors to rank them, but who 
were not asked to rate any of them, the 
experimenter considered it desirable to 
have these persons read the items in the 
Human Relations part of the rating 
scale, so that all of the rankers would be 
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familiar with the conception of Human 
Relations ability used for this study. 


C. REsUuLTs OBTAINED WITH THE USE OF 
THE RATING AND RANKING FORMS 


1. Evaluation of the Data 


The median scale values for those 
statements marked “‘describes fairly well” 
and those marked “partially applies” in 
the Human Relations part of the scale 
were computed for every completed 
rating. The median scale values for items 
in the Job Knowledge and Administra- 
tive areas were computed for a repre- 
sentative sample of 50 of the completed 
ratings only, in order to evaluate rela- 
tionships between the indices of ability 
obtained for the different areas of super- 
vision. 

The ranks assigned the members of the 
sample in Human Relations ability were 
evaluated by computing scale values for 
the supervisors in each list. Hull’s method 
was used whereby it is assumed that the 
persons ranked are normally distributed 
in this ability. Then each rank is trans- 
lated into a per cent position. The per 
cent positions are in turn translated into 
scale values on a ten point scale on which 
each unit represents .5 of a standard de- 
viation. (Thus it is assumed that all cases 
fall within plus or minus 2.5, standard 
deviation units from the mean.) (4, p. 
247 ff.) 

An evaluation of the data obtained 
from the detailed rating forms revealed 
that the median scale values of the items 
checked gave a more reliable indication 
of the supervisors’ abilities than did the 
numerical ratings obtained from the 
same forms. The correlations between the 
ratings obtained from half of the judges 
and those from the other half of the 
judges on the same ratees were .68 for the 
scores derived from the check-list and .44 
for the numerical ratings. Applying the 


HERBERT H. MEYER 


Spearman-Brown prophecy formula, the 
estimated coefficients of reliability of the 
judgments made on this scale and nu- 
merical ratings for the ratings of all the 
judges are .81 and .61 respectively. 

The correlation coefficients presented 
in Table I give an estimate of the degree 
to which the raters’ general impressions 
of the supervisors may have influenced 
their ratings in the different areas of su- 
pervision. These coefficients were com- 
puted from a representative random 
sample of 50 completed rating forms from 
all ratings obtained. The correlations be- 
tween the scale values derived from the 
items endorsed in the check-list are gen- 
erally lower than those between the nu- 
merical ratings made in each area. These 
correlations may not be due entirely to 
the influence of general impression or 
“halo,” since there is no objective evi- 
dence to show that these three areas are 
completely independent of each other. 

Some correlation between the ratings 
in each area and the over-all rating is to 
be expected, since over-all ability as a 
supervisor is to a large degree a com- 
bination of ability in the different areas 
of supervision. It is interesting to note 


TABLE I 


INTERCORRELATIONS BETWEEN THE DIFFERENT 
PARTS OF THE RATING SCALE 


Rating Scores Derived from the Check-List 


(N = 50) 
Job Adminis- 
Knowledge trative 
Administrative 23 
Human Relations -33 -34 


Numerical Ratings 


(N= 50) 
Job Human 
Knowl- Admin Rela- 
edge istrative| tions 
Administrative -55 
Human Relations 32 -64 
Over-all -41 .65 .82 
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that the correlation is lowest between 
ability in the Job Knowledge area and 
over-all ability and highest between 
ability in the Human Relations area and 
the over-all rating. This may be due in 
part to the order of presentation of the 
three areas in the rating scale. (The 
statements and numerical rating for the 
Job Knowledge area were placed first 
in the scale, the Administrative area was 
placed next, and the Human Relations 
area last.) It could also be interpreted 
as an indication that ability in the Hu- 
man Relations area is the most important 
factor in over-all ability as a supervisor. 
However, since the order of presenta- 
tion of the three areas was the same in 
all the rating forms, no control of the 
former condition was provided.* 

The estimated reliability of the scale 
values derived from the ranks assigned 
the supervisors is somewhat higher than 
the indications of their ability derived 
from the numerical indices provided by 
the detailed rating form. The correlation 
between the ranks assigned the super- 
visors by half the judges with the ranks 
assigned the same supervisors by the 
other half of the judges is .77. Applying 
the Spearman-Brown formula, the esti- 
mated reliability of the average ranks 
assigned by all judges is .87. 


2. Determination of a Criterion 
Value for Each Supervisor 


Although the statistical evaluation of 
the data obtained with the detailed 
rating form indicated that the indices of 
supervisory ability provided were reason- 
ably reliable, the use of any one index 
or of any statistical combination of two 


°The Human Relations area was placed last 
in order that the completing of the first two 
parts of the rating form might provide some 
training for the raters which would help them 
to complete the Human Relations section more 
accurately. 
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or more indices seemed to be needlessly 
wasting much valuable material which 
the completed rating forms provided. The 
judges’ responses to the items in the form, 
and especially the differences between the 
way the same judges, rating several super- 
visors, answered the items for those su- 
pervisors, indicated that the information 
provided by the rating forms was more 
reliable than the statistical evaluation in- 
dicated. This can be explained by sev- 


_eral factors. 


In the first place, there seemed to be a con- 
siderable amount of “generosity” error in the 
ratings which proved to be, very difficult to 
minimize. There seemed to be a tendency for 
many raters to compensate for giving a super- 
visor a poor rating in one part of the scale by 
giving him a good rating in another part. This 
tendency was anticipated by the experimenter, 
but it was hoped that the Job Knowledge and 
Administrative parts of the scale would take care 
of that. That is, it was felt that if a man was 
weak in the Human Relations area, the rater 
might feel freer to indicate this if he had a 
chance to indicate the supervisor’s strong points 
also in the other areas. However, many of the 
raters gaye contradictory indications of a super- 
visor’s ability within the Human Relations part 
of the scale. 

Some of the raters differentiated between the 
abilities of the supervisors they rated in the 
check-list part of the scale, but made few distinc- 
tions between them ‘in the numerical ratings. 
Others made very definite distinctions on the 
numerical rating, but checked only the favorable 
items on the check-list for all ratees. Still others 
gave contradictory indications using the “par- 
tially applies” category; thus, they indicated that 
the favorable items in the check-list described 
fairly well the supervisor being rated, then 
indicated that many of the unfavorable items 
“partially applied” for that supervisor. 


To discard all ratings on which some 
contradictory evidence was given as to a 
supervisor’s ability seemed to be a need- 
less waste of data. Therefore, it was 
decided to code the ratings, considering 
all of the information which the com- 
pleted rating form provided, in much 
the same manner that a researcher con- 
ducting an opinion survey would code 
information obtained in an interview. 
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However, in this situation the informa- 
tion available was of a much more 
quantitative nature than that usually ob- 
tained in an interview, so that the coding 
could be done on a more objective basis. 


Considering only the Human Relations part, 
each completed rating form was assigned a letter 
code of A, B, C, D, or E, representing degrees 
of ability, as indicated by the data provided, from 
good (A) to poor (E) respectively. In the ma- 
jority of cases ‘the codes were assigned mainly 
on the basis of the differences between the re- 
spective ratees indicated by the ratings of the 
same rater. The responses to the items in the 
check-list, the median scale value of the items 
marked, and the numerical rating assigned were 
recorded for convenience on a special form on 
which the scale values of the items were 
arranged in rank order from high to low. Thus, 
the data provided by the form were summarized 
in such a way that it was easy to evaluate all of 
the information at a glance and to compare the 
ratings assigned to different supervisors by the 
same rater. The investigator found it relatively 
easy to assign codes which seemed to be satis- 
factorily reliable to over go per cent of the 
ratings. A few of the ratings had to be discarded 
because it was impossible to classify them with 
reasonable certainty. This was especially true in 
cases where one rater, who rated only two or 
three supervisors, indicated no differences be- 
tween them, by marking all the favorable items 
in the check-list and assigning a numerical rat- 
ing of 7 to each. 

In order to check the reliability of the code 
letters assigned, another member of the In- 
dustrial Psychology Division of the Company, 
who was unfamiliar with the data obtained on 
the completed forms, was asked to code all of 
the ratings obtained in one department (repre- 
senting about one-fifth of all ratings obtained). 
He was given about 20 minutes of oral instruc- 
tion in addition to brief written directions as 
to the criteria for assigning the code letters. 

The code letters assigned by the second coder 
correlated .g25 with those assigned by the ex- 
perimenter. In no case did the two coders differ 
by more than one classification. Both coders felt 
that this correlation could be further improved 
if the second coder were given additional train- 
ing in the procedures used by the first. How- 
ever, it was felt that the correlation obtained 
indicated that the basis for assigning codes was 
satisfactorily objective. 


The code letters were translated into 
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per cent positions (as would be ranks 
from one to five), and these were in turn. 
translated into scale values by means of 
Hull’s table. Consequently, these data 
were comparable to the scale values as- 
signed the supervisors on the data ob- 
tained by the method of rank order. 

All of the scale values, computed from 
both the codes and rankings, available 
for each supervisor were tabulated on a 
final Summary Data Sheet for Ratings. 
Thus all of the scale values for a super- 
visor determined from the ratings by su- 
periors, as well as all of the scale values 
determined from the rankings by depart- 
ment officials, were listed separately for 
each supervisor in the sample. The num- 
ber of scale values available for each su- 
pervisor ranged from two to fourteen 
with a median number of 5.7. A final 
criterion value was computed for each su- 
pervisor by averaging the two thirds of 
the scale values that were in closest agree- 
ment. Where the two thirds of the scale 
values in closest agreement did not fall 
within a range of two scale units (one 
standard deviation), the cases were 
dropped from the final sample.* It was 
felt that it would be needlessly adding 
unreliability to the data to include in the 
study the subjects on whom there was 
little agreement among the judges as to 
their ability. 

This average scale value of the ratings 
of each retained member of the sample 
constituted the criterion measure of lead- 
ership ability against which the tests and 
questionnaires administered to the super- 
visors were validated. 


* As indicated in Chapter II, a total of 17 cases 
had to be discarded from the sample because it 
was impossible to obtain a reliable indication of 
ability of those supervisors in the human rela- 
tions aspect of their job. 
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CHAPTER IV 


THE DEVELOPMENT OF OBJECTIVE MEASURES OF 
LEADERSHIP ABILITY 


A. THe DEVELOPMENT OF A MULTIPLE- 
CuHoice Projective TEsT OF 
SociAL ATTITUDES 


HE problem with which the experi- 

menter was confronted in the de- 
velopment of the multiple-choice projec- 
tive test was to devise a measure which 
would reveal a person’s “true” attitude 
toward others. Since psychologists gen- 
erally believe attitudes to be an impor- 
tant determinant of one’s perceptions, it 
was felt that a measure of differences in 
the way particular social situations were 
reacted to should be a reflection of dif- 
ferences in social attitudes. That is, a 
test which would be a direct measure of 
perceptions, the way in which particular 
stimulus situations were organized or in- 
terpreted, would be, thus, an indirect 
measure of attitudes. 

On the basis of the preliminary study 
of good and poor work-group leaders in 
the Company, there seemed to be evi- 
dence to indicate that the good leader's 
attitude toward other persons did differ 
from that of the poor leader. The good 
leader tended more to regard the motives, 
feelings, and goals of the members of his 
work group. On the other hand, the poor 
leader seemed to be more likely to regard 
others in relation to his own needs. He 
was less likely to consider the reason for 
the behavior of other persons than he was 
to consider the effect of their behavior 
on the achievement of his own goals. Or, 
it might be stated simply that the good 
supervisor was inclined to look at the 
other person’s point of view in his inter- 
personal contacts. Thus, it follows that a 
test to measure differences in the way 


certain social situations are perceived 
should be related to performance as a 
leader. 

As a first attempt at a measure of this 
rather complex quality, a test was con- 
structed which presented supervisor- 
employee problem situations. The testee 
was asked to choose the course of action 
which should be taken by the supervisor 
in each situation and to indicate the 
reasons that best explained why this 
course of action should be taken. The 
test was scored with an a priori key to 
measure the degree to which each alter- 
native indicated the presence of the 
quality described above. It was felt that 
the reasons which the testee gave for 
taking the courses of action would reveal 
his true social attitudes, 


The test was administered to three groups for 
pre-testing. One of these groups consisted of 
eleven Production Department employees in the 
Company who were applying for supervisory 
positions. A second group consisted of forty-one 
Sales Department employees and supervisors who 
were participating in a related study. The third 
group consisted of twenty-six sub-foremen em- 
ployed by the City of Detroit. 

The results obtained with the test were gener- 
ally unsatisfactory. In the first place, there was 
very little “spread” in the answers. That is, most 
of the testees checked the same alternatives. 
Usually these were the correct answers according 
to the a priori key. Secondly, what true score the 
test did seem to yield was highly related to scores 
on the general mental ability test, but unrelated 
to any criterion of “human relations” ability. 
Other evidence, such as comments made by the 
testees, also indicated that the test was not 
measuring these persons’ true social attitudes, 
but only their knowledge of how the problem 
situations should be handled according to prac- 
tices recommended in textbooks and articles on 
supervision. 


The test had one outstanding merit— 
face-validity. Judging from the favorable 
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reaction of the applicants and sub- 
foremen to the test, it was felt that it 
would be highly desirable to include this 
characteristic, if at all possible, in any 
test for supervisors. Therefore, a new test 
was developed incorporating somewhat 
similar items. However, this new test was 
designed to function as a projective test. 
It was labeled Social Judgment Test for 
Supervisors. 

Ostensibly, the purpose of the test 
was to measure the ability to size up per- 
sons from brief descriptions of their 
personalities, and to predict their reac- 
tions to particular social situations. That 
is, instead of asking the testee how he 
would react to certain problem situa- 
tions, he was asked to predict the be- 
havior of another person, who was de- 
scribed briefly. Thus the testee could at- 
tribute to another person certain social 
attitudes, as they were reflected in the 
person’s reaction to the problem situa- 
tions. It was hoped that the testee, in 
predicting the behavior of others, would 
reveal his own attitudes through pro- 
jection. 

In order that the test might function as a 
projective instrument, it was felt that the de- 
scriptions of the persons whose behavior was to 
be predicted should be as unstructured as pos- 
sible. However, in administering trial forms of 
items, it was found that the testees objected if 
the descriptions were too unstructured. They 
complained that there was no basis for prediction. 
On the other hand, if the descriptions were too 
complete, little projection was attained. There- 
fore, a happy medium had to be achieved. It 
was attempted to describe the persons well 
enough to give the testee an opportunity to feel 
that he could size up the persons, yet to make 


the descriptions unstructured enough to permit 
almost any interpretation, so that all of the 


*The name of the test was changed later to 
Human Relations: A Test of Ability to Predict 
Human Reactions. It was felt that “Social Judg- 
ment” could have a connotation which might 
put some applicants in the wrong frame of mind 
for regarding the items in the test as it was de- 
sired that they should be regarded. 
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alternatives to the questions might seem plausi- 
ble to some testees. It was also attempted to 
describe the persons in such a way that the ma- 
jority of supervisors could easily identify with 
them. 

In order that the task of prediction should not 
seem to be entirely impossible to the conscien- 
tious testee, some of the alternatives were phrased 
in such a way that a direct clue for responding 
to the item was given in the description. This 
was also done in order to remind the testee re- 
peatedly that he was to predict the behavior of 
the person described. Such alternatives were not 
scored. 

As an example, one of the descriptions of a 
person whose behavior the testees were asked 
to predict is reproduced from the test as follows: 


Questions 27 through 37 refer to the following 
paragraph. 

Harry Maynard is a senior accountant for a 
large paper company. He is 42 years old, mar- 
ried, and has two children of school age. His 
favorite recreation is fishing. 

Harry started as a messenger, learned ac- 
counting on his own, and worked his way up. 
He has only a high school education, although 
most of the other accountants are college 
trained. Nevertheless, he gets along with the 
others very well and he is well liked by them. 
Harry is a good accountant and he likes his 
work very much. 


Questions such as the following were then pre- 
sented referring to the description: 


29. Harry’s supervisor is near the retirement 
age. Most of the men will be glad when he 
retires because he is so grouchy, How 
would you expect Harry to feel toward 
him? 


(a) He probably agrees with the 
rest of the men. 

(b) He probably feels that he 
might be grouchy too if he 
were as old as the boss. 

—__. (c) He probably tries to avoid 
the boss. 

(d) He probably figures that 
something must be troubling 
the boss. 

34. One of the men in the department, Joe 
Smock, borrowed 10 dollars from Harry 
and promised to return it in a week. When 
the week was up, Joe told Harry he could 
pay back the 10, but that he would like 
to keep the money for another week if 
it were all right. Although Harry would 
have liked to get the money then, he didn’t 
need it. What would you expect Harry to 
have told him? 

(a) “That’s all right.” 

—___. (b) “I'd rather have the money 

right now if you don’t mind.” 

—_—— (c) “Sure Joe. Is there anything 

else I can do to help?” 
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(d) “I'd like to but I’m short my- 
self.” 


36. The group leader, who is in charge in the 
supervisor’s absence, is a young, college 
trained man with less experience than 
Harry and some of the others. When the 
man was appointed group leader, Harry 
thought to himself, 

(a) “I hope he makes out all 

right.” 

—___— (b) “The boss is making a mis- 
take in not appointing the 
senior man.” 

(c) “That fellow isn’t qualified 
for the job of group leader.” 

(d) “I like to see a young fellow 
get ahead.” 


The test was keyed on the basis of the 
judgments of “experts.” The experts con- 
sisted of eight psychologists, all of whom 
were experienced in the fields of person- 
nel or clinical psychology. They were 
asked to judge the degree to which each 
alternative answer to the questions indi- 
cated the presence of the social attitude 
described above. The -directions given 
to them for judging the items were writ- 
ten out in detail. Only those alternatives 
were keyed on which there was agree- 
ment in the marking of the item by at 
least 75 per cent of the judges. About 
three-fourths of the alternatives met this 
criterion. 

The test was administered for pre-test- 
ing to a group of thirty-five supervisors 
employed by the City of Detroit. The 
criterion data which could be obtained 
as to the leadership ability of the testees 
were unsatisfactory. Nevertheless, there 
was evidence that this form of the exam- 
ination was far more promising than the 
earlier form, where the testee was asked 
to indicate how he would handle the 
problem situations, Judging from the dif- 
ferences in the responses to some of the 
items which were the same in the two 
forms, the projective form of this test 
was not measuring what the testees 
thought should be done in the situations. 
On the pre-test of the first form, the re- 
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sponses of the testees were very much 
the same (and usually “correct”). How- 
ever, the responses to the items varied 
considerably in the second form of the 
test, designed to act as a projective. 

A few of the items were altered on the 
basis of the pre-test results before the 
test was administered to the experimental 
sample of 190 supervisors in the Com- 
pany. In administering the test to this 
group (the experimental sample), the in- 
vestigator read the directions and elab- 
orated where questions arose. In every 
testing session an additional warning was 
given that in order to do their best, the 
testees must be objective in predicting 
the reactions of the persons described, 
and that it would be necessary to guard 
against the tendency to let their own 
personalities enter into the situations, It 
was felt that this test would function 
best as a projective only if the testees 
tried conscientiously to predict the be- 
havior of another person. 


B. THE DEVELOPMENT OF Two TESTs 
EMPLOYING FORCED-CHOICE ITEMS FOR 
MEASURING SOCIAL ATTITUDES 


These tests were constructed on the 
basis of the hypothesis that a person’s 
attitude or frame-of-reference in dealing 
with people could be determined from 
the choices that he makes between cer- 
tain alternatives that are approximately 
equal in affective value but different in 
their general reference. One of these 
measures was called the Supervisor's 
Opinionnaire. 

Part I of this test was constructed by listing 
rules of advice to supervisors in groups of three. 
The testee was asked to rank the three rules 
presented in each item in the order in which he 
judged that the different rules should be empha- 
sized in the development of new supervisors. An a 
priori key was developed on the basis of experts’ 


judgments. The items were scored as to the de- 
gree to which each rule indicated the im- 
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portance of the supervisor's considering the 
points-of-view of others. 

An empirical scoring key was also constructed 
for this test based on differences in responses to 
the items of high and low groups on the criterion 
measure of human relations ability. This analysis 
was made for the members of Sample A. The 
tests for the members of Sample B were then 
scored with this key for a cross-validation. 

Part II of the test was constructed by pre- 
senting a list of brief descriptions of supervisory 
applicants, followed by several questions refer- 
ring to the men described. Such questions as 
the following were asked: 

1. Choose the four men whom you would hire 
as supervisors. Indicate the order of your 
preference. 

2. Choose the two men you would most like 
to work under. 

3. Choose the two men whom you would ex- 
pect to get the most work out of employees. 

4. Choose the two men whom you would ex- 
pect to get the least work out of em- 
ployees. 

5. Which of the men is most like yourself? 
No a priori key was used for this part. It was 
scored on the basis of differences between the 
responses to these items of good and poor super- 
visors in Sample A. Sample B was used for cross- 
validation. This part was added to the Super- 
visor’s Opinionnaire mainly for the purpose of 
exploring the possibilities of this type of item for 
predicting leadership success. 

Part I of the Supervisor’s Opinionnaire was 
pre-tested by administering to a group of 131 
foreman applicants, sub-foremen, and super- 
visors employed by the City of Detroit. The 
scores based on the a priori key were found to 
have satisfactory reliability. The estimated re- 
liability coefficient (odd-even) was 83. 


The second test, called Description of 
Supervisors, was constructed on the basis 
of a principle used by Thurstone in one 
of the tests included in his Factorial 
Study of Perception (18). 


A list of adjectives was compiled, describing 
personality traits that were either clearly social 
or individualistic in their reference. For example, 
an adjective such as “tactful” would be classi- 
fied as a social trait, while the word “industri- 
ous” would be classified as an individualistic 
trait. These adjectives were then investigated in 
the published word-lists in order to determine 
their familiarity to the average person. Un- 
familiar words were eliminated. Approximately 
eighty, forty of each type, were selected for in- 
clusion in the test. 

The test items were constructed by pairing 


H. MEYER ° 


words, one from each group, that were judged . 


by the investigator to be of approximately equal 
affective value. The first edition of the test con- 
sisted of 135 such pairs, along with 15 items in 
which two social or two individualistic traits 
were paired together as “jokers” to make the 
pattern which the other items presented more 
difficult to detect. The testee was asked to choose 
the one word of each pair which he considered 
to describe the more desirable trait for a super- 
visor. The test was scored by giving one point 
credit for every desirable social trait and every 
undesirable individualistic trait chosen. 

The test was administered to several groups. 
These included supervisory applicants employed 
by the City of Detroit and several groups of 
college students. The 110 items which correlated 
highest with total score were selected for the 
final edition of the test. This form was pre-tested 
by administering it to a group of 131 foreman 
applicants, sub-foremen, and supervisors em- 
ployed by the City of Detroit. A reliability check 
indicated that the test was a very consistent 
measure of whatever it was measuring. The est- 
mated reliability coefficient was .96 (split-half 
and Kuder-Richardson). The scores correlated 
negatively (—.26) with scores on a test of general 
mental ability. 


C. THe DEVELOPMENT OF A PERSONAL 
DATA INVENTORY 


A survey of related studies indicated 
that good and poor work-group leaders 
might differ in their interests and in 
certain items of their background or per- 
sonal history. Therefore, the Personal 
Data Inventory was constructed to probe 
these areas. 


The individual items were constructed largely 
on the basis of indications from the results of 
related studies or from areas of investigation 
suggested by the general literature on leader- 
ship. For example, in several studies it had been 
found that school leaders differed significantly 
from non-leaders in their record of participation 
in sports and related activities, Therefore, items 
were included in Part I of the inventory to in- 
vestigate possible differences in this area among 
the good and poor work-group leaders. 

As another indication of interests, in a num- 
ber of questions the supervisors were asked to 
choose between certain occupations, presented in 
groups of three. These items were constructed 
on the basis of the hypothesis that good leaders 
would prefer working with people. Somewhat 
similar and related hypotheses served as a guide 
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to the construction of items in Part II, titled 
“Self Description.” Each of these items consisted 
of a group of three descriptive statements such 
as the following example: 
1. am quick to ask questions. 
2. have respect for opinions of others. 
3. stick to whatever I start. 
In these items, as well as the items dealing with 
occupational choices, the supervisors were asked 
to rank the alternatives by indicating their first 
and last choices in each group of three. In this 
way more responses were obtained to the same 
number of items than would be the case if only 
the first choice were marked in each group. 
The items in Part III, titled “Personal Data,” 
consisted of questions concerning one’s back- 
ground or personal history. ‘Ile following are 
examples of the type of questions asked:* 
82. Regarding discipline, how would you de- 
scribe the way your parents treated you as 
a child? 
1. very strict 
2. firm, but not harsh 
3- usually allowed to have my own way 
4- had my own way about everything 
5- inconsistent (sometimes strict, some- 
times lax) 
g8. Have you been lucky or unlucky in the 
friends you have made? ~ 
1. very unlucky 
2. rather unluck 
3. neither nor lucky 
4. rather lucky 
5. very lucky 
Many of these items were collected from in- 
ventories used in previous studies, and other 
items were constructed from ideas suggested in 
the literature on leadership and from hypotheses 
intimated by personality theory to be worth while 
investigating. For example, in some studies it 
had been found that the good supervisors were 
better educated than were the poor (7, 19). It 
seemed reasonable that this difference would be 
less pronounced if ability in the human relations 
area were used as a criterion rather than over- 


*The materials constructed for use in this 
study are not presented in full in order to con- 
serve space and to prevent possible misuse of 
some of the tests. The author will be glad to 
arrange to make the materials available to quali- 
fied persons on request. 
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all ability as a supervisor. Therefore, an item 
was constructed to test this hunch. Other items 
were included because it was felt that possible 
sources of frustration would be more likely to 
have been present in the early lives of the poor 
supervisors than would be the case with the 
good supervisors. In a similar manner, the in- 
clusion of each item was based on some 
hypothesis. 

Part IV of the inventory consisted of questions 
concerning one’s health. These were included be- 
cause “Health” is often mentioned in books and 
articles as an important consideration in the 
selection of leaders. Furthermore, it was felt that 
certain responses to some of the items may reflect 
indirectly particular personality characteristics. 

An original form of the inventory was pre- 
tested by administering it to a group of thirty- 
five sub-foremen and foremen employed by the 
City of Detroit. Although it was felt that the 
foremen might refuse to answer many of the 
items because of the very personal areas which 
they probed, this actually occurred in only a 
very few cases. On the basis of the pre-test, a 
few of the items were altered slightly because 
of evidence that they were not clearly under- 
stood. 


The final form of the inventory was 
not administered to the experimental 
sample in the Company at the regular 


' testing sessions. As was mentioned in 


Chapter II, they were asked to mail in 
the completed form. This procedure was 
followed because it was desired by those 
responsible for the study that the super- 
visors should not feel compelled to com- 
plete the inventory, especially items of 
a personal nature. It was felt that by 
allowing them to complete the form at 
their convenience, they would feel more 
free to omit items they considered ob- 
jectionable than they would in a testing 
session, where some group pressure may 
have operated. 


ABLEs II and III on the following 
pages present some of the general 
results. Table II shows the mean and 
standard deviation for each of the tests 
and its correlation with the criterion for 
each of the sub-groups into which the 
total sample was divided. Table LII pre- 
sents the intercorrelations for all the 
tests and the human relations criterion 
measure. 


A. GENERAL RESULTS FOUND FOR THE 
CONTROL VARIABLES 


Two of the control measures obtained, age and 
years of experience as a supervisor, were not 
presented in the tables because they were not 
found to be relevant variables. Neither showed 
a significant correlation with the criterion in any 
of the groups. The only significant relationship 
found between these measures and the tests was 
a negative correlation (—.32) between age and 
the Personnel Test. This is consistent with that 
generally found in other studies. 

Considering the tests in the order of their 
presentation in the tables, it was found that for 
the Personnel Test the Office supervisors scored 
significantly higher than the Trades and Operat- 
ing supervisors. The only statistically significant 
difference between the mean scores in the four 
departments is that for the lower score in the 
Overhead Lines Department as compared to the 
other three. The correlation between the Per- 
sonnel Test scores and the criterion is positive 
in all the groups, although it is not very large. 
The correlation of .19 for the sample as a whole 
is statistically significant at the one per cent level 
of confidence. 

The difference in mean scores between the 
supervisors of the two types of work groups is 
even more significant on the Word Fluency test 
than it is on the Personnel Test. The con- 
siderably higher mean score for the Office super- 
visors might be expected, since they probably 
work much more with written materials. How- 
ever, the most interesting difference found in the 
results of the Word Fluency test for the differ- 
ent sub-groups is that between the Office and 
Trades and Operating groups in the correlation 
of this test with the criterion. The scores on the 
test show a significant relationship to the leader- 
ship criterion for the supervisors of Office 
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workers, but no relationship for the supervisors 
of Trades and Operating employees. 


B. RESULTS FOR THE PROJECTIVE TEsT 
DESIGNED TO MEASURE SOCIAL 
ATTITUDES 


In general, the scores on this test, titled 
the Social Judgment Test for Supervisors, 
show a higher relationship to the cri- 
terion of leadership quality than do any 
of the other variables studied. The cor- 
relation coefficients between these vari- 
ables are statistically significant at the 
one per cent level of confidence in the 
sample as a whole and in each of the 
sub-groups except for the Accounting De- 
partment. It is interesting to note that 
the relationship between this test and 
the criterion is the same in the Trades 
and Operating and Office groups. The 
differences in mean scores on this test 
for the different sub-groups are not sig- 
nificant. 

The results obtained on the individual 
items of the test were analyzed by com- 
paring the responses of the upper and 
lower 27 per cents on criterion score of 
the cases in Sample A. The differences 
in the responses of these groups were sig- 
nificant at the ten per cent level of con- 
fidence or better on sixty-two of the alter- 
natives. Of these, forty-six of the differ- 
ences were in the same direction as the 
test was keyed, three were in the opposite 
direction, and thirteen were not scored 
on the key in its present form. Thus for 
the forty-nine alternatives on which sig- 
nificant differences were found, 94 per 
cent of the differences were in the same 
direction as the items were being scored 
by the a priori key. 

The results of the item analysis pro- 
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TABLE II 


MEAN, STANDARD DEVIATION, AND CORRELATION WITH THE CRITERION 
FOR TEST VARIABLE 


WORK-GROUP LEADERSHIP 


Departments 


Types of Work 
Groups Cases 


tion ing 


N=56 | N=34 


Produc- | Account- 


Trades 
Over- & 
head Sales Operat- Office 
ing 
N=46 N=77 


Personnel 19.3 21.0 
Test S.D 5-3 


a 


Word Mean 35-3 40.3 
Fluency S.D. 9.4 


Social Judgment or Mean agr. 132.0 
Human Relations Test | S.D. 12.7 9-4 
(The “projective” test) | r (crit.) -34 .22 
Supervisor’ s Mean 86.0 92.9 
Opinionnaire S.D. 16.4 18.4 
(Forced-choice) r (crit.) .06 —.22 


Description of 
Supervisors S.D. 23.8 25.5 
(Forced-choice) r (crit.) .03 .20 
Personal Data Mean 


Inventory 


16.4 22.3 22.1 19.6 
6.3 7.8 6.0 2.2 7.0 
.24 


35-8 40.5 34-2 42.9 37-6 
14.5 12.6 II.I 13.6 12.8 
‘ -42 .20 


129.2 131.0 130.7 130.9 130.8 
II.9 12.0 11.8 11.8 11.8 

49 37 36 36 - 36 
82.9 89.3 86.9 88.3 87.4 
13.9 16.9 1$.7 18.2 16.6 
— .08 .16 «33 — .06 -03 


52-5 59-9 55-4 
23.1 20.1 23.3 3.3 23.6 
.00 — .02 16 — .03 -09 


49.9* go.x* 


Criterion 50.6 55-5 


S.D. 12.8 16.2 


49.9 54.0 51.8 52.9 52.3 
12.1 13.6 13.4 14.4 13.8 


Sample A, so that a cross-validation was necessary 


vided further evidence that a real rela- 
tionship existed between the quality that 
this test was measuring and the leader- 
ship criterion. The fact that the correla- 
tion coefficients between these variables 
were no higher than they were in the 
various groups seemed to be due to sev- 
eral factors. In the first place, the reli- 
ability of the scores on the test was not 
high. The coefficient of reliability was 
estimated to be .58 by the split-half 
method (correlating scores on odd num- 
bered items with ‘scores on even num- 


* The statistics presented for the Personal Data Inventory were computed in Sample B, the cross- 
validation sample, only. The figures are based on a total of 53 supervisors in this Sample who re- 
turned their inventories. The N was considered to be too small to analyze the data by departments 
The Personal Data Inventory was scored on the basis of an item analysis of the results obtained in 


other tests were scored either by a priori keys or by scoring keys developed on other groups. 


in order to obtain meaningful results. All of the 


bered items). Secondly, the lack of a per- 
fectly reliable criterion measure would 
lower the correlation coefficient obtained. 
These two factors lower the estimated 
validity of any test. However, a major 


factor which seemed to be’ lowering the 


obtained validity coefficient for this test 
was the peculiar relationship that ex- 
isted between scores on the test, scores 
on the mental ability test, and the cri- 
terion measure, 


The Pearson product-moment correlation 
coefficient for scores on the Social Judgment Test 
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TABLE III 
INTERCORRELATIONS AMONG THE TEST VARIABLES AND THE CRITERION 


Personnel Word 
Test Fluency 


Social Supervisor's Description Personal 
Judgment Opinion- Data 
Test naire Supervisors Inventory 


Personnel 
Test 


Word 
Fluency 


Social Judgment or -14 
Human Relations Test 
(The “‘projective’’ test) 


Supervisor's 
Opinionnaire 
orced-choice) 


+35 -I9 


Description of 
Supervisors 
(Forced-choice) 


Personal Data 
Inventory 


Criterion -19 .20 
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-C3 .09 


and the Personnel Test was —.o2z. However, the 
correlation ratio (eta) between the two measures 
was .47. Correcting this for the population of the 
sample and the number of classes into which the 
population was divided, the unbiased correla- 
tion ratio (epsilon) was .37. This is significant 
at the one per cent level of confidence, indicating 
that there is some relationship between these 
variables, but that the relationship is curvilinear. 
It was found that the persons who scored low 
and the persons who scored high on the Person- 
nel Test scored lower on the Social Judgment 
Test than did the persons who scored in the 
middle range on the Personnel Test. 

The scatter plot showing the relationship be- 
tween the Personnel Test and the criterion meas- 
ure revealed that this relationship was somewhat 
funnel-shaped. The correlation between these 
variables was generated mainly by the relation- 
ship in the upper quartile of Personnel Test 
scores. That is, there was little relationship be- 
tween scores on the mental ability test and the 
leadership criterion for persons in the lower 
and middle ranges of mental ability. However, 
there was a tendency for persons who scored 
very high in mental ability to be rated high in 
leadership ability. 


* The correlation coefficients presented for the Personal Data Inventory were computed in Sample B, 
the cross-validation sample, only. (See footnote, Table II.) 


The scores on the Social Judgment Test 
plotted against the criterion scores also presented 
a funnel-shaped distribution, but in the opposite 
direction. Persons who were rated low in leader- 
ship ability were very likely to score low on 
the test. However, persons who were rated high 
were as likely to score rather low as they were to 
score high on the Social Judgment Test. 


The nature of the relationships be- 
tween the three variables described above 
suggested that perhaps the general 
mental ability factor was acting to lower 
the correlation coefficient obtained be- 
tween the scores on the Social Judgment 
Test and the criterior of leadership 
ability. An analysis of this correlation 
for scores in different ranges on the Per- 
sonnel Test indicated a trend in this di- 
rection. For example, it was found that 
the correlation between the Social Judg- 
ment Test and the criterion was .44 if 
only the persons who scored below the 
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75th centile on the Personnel Test were 
considered. 

This effect seemed to be even more 
pronounced in the group of Office super- 
visors, Which included most of the per- 
sons who scored very high on the Person- 
nel Test. If those persons who scored 
thirty or above on the Personnel Test 
(thirteen cases) were eliminated from the 
sample, the correlation between the So- 
cial Judgment Test and the criterion for 
the remaining cases (those scoring below 
thirty on the Personnel Test) was .54. 

Because of these relationships it was 
decided that for best prediction of the 
leadership criterion it would be necessary 
to correct scores on the Social Judgment 
Test for Personnel Test scores, This was 
done by fitting a curve to the Social Judg- 
ment Test scores plotted on the abscissa 
against the Personnel Test scores plotted 
on the ordinate, then applying the neces- 
sary corrections to make this curvilinear 
relationship rectilinear in a vertical di- 
rection. As a result, the Social Judgment 
Test scores were increased for those per- 
sons who scored either very high or very 
low on the Personnel Test.‘ This cor- 
rection applied to the scores resulted in 
an increase in the correlation between 
the Social Judgment Test and the cii- 
terion for Office supervisors from .36 to 
.45. The same correlation was not 
changed for the supervisors of Trades 
and Operating employees. This may have 
been due to the fact that few of these 
supervisors made very high scores on the 
Personnel Test, since it was in this range 
of scores that the correction was great- 

‘This means that the variance in Social Judg- 
ment Test scores which is accounted for by the 
variance in Personnel Test scores has been par- 
tialled out. However, the conventional equation 
for computing a partial correlation coefficient 
would not be applicable here because of the 


curvilinear nature of the relationship between 
these two variables. 
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est. For the Accounting Department, in 
which the supervisors were mostly of 
office employees, the application of this 
correction resulted in an increase in the 
correlation between the Social Judgment 
Test and the leadership criterion from 
tO .35. 


C. RESULTS FOR THE FORCED-CHOICE 
Tests DESIGNED TO MEASURE 
SocIAL ATTITUDES 


The results obtained with these tests 
were generally negative. As Table I 
shows, the correlation coefficients be- 
tween these tests and the criterion were 
low in all the groups. None of these co- 
efficients were found to be statistically 
significant. The only significant corre- 
lations found between these tests and 
other variables were the positive correla- 
tions between the Supervisor's Opinion- 
naire and the Personnel Test, the Word 
Fluency test, and the Personal Data In- 
ventory. 


In both tests the items were analyzed by com- 
paring the responses of the upper and lower 27 
per cents of the supervisors in Sample A on the 
leadership criterion measure. In the test called 
Description of Supervisors there were fewer sta- 
tistically significant differences than would be 
expected by chance. In the Supervisor’s Opinion- 
naire a few more differences were found than 
would be expected by chance, but it was very 
difficult to detect any particular pattern that 
these differences followed. If any general 
tendency at all was present, it seemed to be that 
the supervisors in the low criterion group made 
some choices which would possibly indicate poor 
judgment. For example, in every item where the 
following rule appeared, a significantly greater 
number of this group, as compared to the high 
criterion group, selected this rule as one that 
should be emphasized in training: “Cultivate a 
tone of voice and manner of speech which will 
be listened to.” 

An empirical scoring key was constructed on 
the basis of differences found in the item analysis 
of this test on Sample A, then applied to Sample 
B for cross-validation. The correlation between 
the test scored on this key and the criterion was 
only .12. The fact that this correlation was so 


| 
‘est | 
ted 
site : 
on 
igh 
to | 
| 
| 
ral | 
yer 
| 
nt | 
| 
on 
er- 
di- | 
lat 
| 
if | 
he 


22 


low may have been due in part to the unre- 
liability of the scores based on this key. The 
estimated reliability of the scores, by the split- 
half method, was only .37. 

In Part IL of the Supervisor’s Opinionnaire 
the only significant differences found were the 
following. The good supervisors (upper 27 per 
cent on criterion) were much more likely to 
describe themselves as being like the supervisor 
who is described as “an intelligent, analytical 
person. He is inclined to study and observe. He 
likes to get at the how and why of things.” On 
the other hand, the poor supervisors (lower 27 
per cent on criterion) indicated a preference for, 
and described themselves as being like the 
supervisor who is described as “careful and 
cautious—makes no snap decisions. He carefully 
plans his work so that it can be carried out 
most efficiently.” These differences were signifi- 
cant at the one per cent level of confidence. 

D. RESULTS FOR THE PERSONAL 


DaTA INVENTORY 


As indicated in Chapter IV, the super- 
visors were permitted to complete the 
Personal Data Inventory on their own 
time and to return it through the mail. 
Approximately 86 per cent were re- 
turned. The proportions returned in the 
Office and Trades and Operating groups 
were almost identical (85.9 and 85.3 per 
cents). The mean age of the members of 
the group who did not return the in- 
ventories was a little higher than the 
mean age for the total sample (48.45 vs. 
45-73). .The average criterion score for 
this group was somewhat lower than 
that for the entire sample (47.79 Vs. 
52-30). Thus, the results obtained on this 
inventory are from a group that is a 
little more highly selective than the total 
sample. 

The results obtained on the Personal 
Data Inventory were analyzed by com- 
paring the responses of the supervisors 
rated in the upper and lower 27 per cents 
on the criterion of leadership ability. 
This analysis was made for the total 
sample considered as a whole, and for the 
Office and Trades and Operating groups 
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considered separately. For convenience in 
discussing the results, the groups rated 
in the upper and lower 27 per cents on 
the criterion measure of leadership 
ability will be called the “good” and 
“poor” supervisors respectively. 


The most conspicuous of all differences found 
was probably the fact that many more of the 
good supervisors had participated in sports than 
had the poor supervisors. In twenty-one of the 
twenty-five activities listed, more of the good 
supervisors indicated that they then did or that 
they had at one time participated in the activi- 
ties than did the poor supervisors. For most of 
the popular sports, such as tennis, golf, football, 
softball, swimming, track, and bowling, the dif- 
ferences were significant at at least the five per 
cent level of confidence. The only activities for 
which there was a trend in the opposite direction 
were hockey, boxing, and wrestling. In the latter 
two sports the differences were more significant 
than for the former, and especially for the 
supervisors in the Trades and Operating group. 
That is, more of the poor supervisors had par- 
ticipated in boxing and wrestling than had the 
good supervisors. Another reversal was that more 
of the poor supervisors in the Trades and 
Operating group had participated in pool or 
billiards than had the good supervisors. How- 
ever, the trend was in the opposite direction in 
the Office group. 

As for hobbies the results showed that, in gen- 
eral, a greater proportion of the poor super- 
visors in the Trades and Operating group 
participated in these activities than did the good 
supervisors. However, in the Office group, the 
trend was in the opposite direction. As a result 
the differences on these items were not signifi- 
cant for the total sample. . 

A greater proportion of the good supervisors 
indicated a preference for radio programs featur- 
ing music, whereas the poor supervisors pre- 
ferred news broadcasts. On another item a sig- 
nificantly greater proportion of the good super- 
visors in the Office group said that they liked 
“very much” to play with, read or talk to chil- 
dren. 

The results on items dealing with occupational 
choices and items of “Self Description,” are 
somewhat difficult to describe, since the prefer- 
ences indicated by the groups must be considered 
in relation to the alternatives in each item from 
which the testee was forced to choose. In general, 
the good supervisors indicated a preference for 
occupations such as Red Cross official, politician, 
factory superintendent, professional baseball 
player, industrial relations expert, and life in- 


= | 
| 
| 
| 
il 
q 
| 


FACTORS RELATED TO SUCCESS IN WORK-GROUP LEADERSHIP 


surance salesman; while the poor supervisors 
preferred occupations such as traffic police 
sergeant, bartender, sales manager, lighthouse 
keeper, public health official, professional rifle 
shot, research worker in a laboratory, owner of a 
roadside stand, and locomotive engineer. There 
is probably some tendency for the good super- 
visors to prefer occupations in which it is more 
necessary to deal with people, than do the poor 
supervisors. 

In general, fewer significant differences were 
found on the items of self-description than on 
the occupational choices. One of the differences 
found to be very significant was the fact that a 
larger proportion of good supervisors indicated 
that they “frequently make wagers” or “occasion- 
ally make wagers,” while the poor supervisors 
were much more likely to say that they “never 
make wagers.” On another item more of the 
good supervisors chose the alternative, “I am 
well liked by others,” whereas the poor chose 
either “I command the respect of oth«::,” or “I 
dislike supervising the work of others.” In the 
Office group the good supervisors indicated that 
they “like to deal with people,” whereas the 
poor chose “like to deal with things.” This differ- 
ence was not significant in the Trades and 
Operating group. 

There were few significant differences on the 
“Personal Data” items in Part III. There was 
some tendency for more of the good supervisors 
to be married, rather than single. The poor 
supervisors had worked for the Company longer, 
although this difference was much greater in the 
Trades and Operating group than in the Office 
group. There was no significant difference in 
the amount of education for the upper and 
lower groups. However, the parents of the good 
supervisors were reported to have had more 
education than those of the poor supervisors. 

Regarding membership in different types of 
social organizations, the high criterion group 
belonged to or had belonged to community 
social organizations or hobby clubs, whereas the 
low criterion group had belonged to labor 
unions. This difference was significant at the one 
per cent level. There was no significant differ- 
ence between the number of times the good and 
poor supervisors had held offices in clubs, ac- 
cording to their report. 

On one personal history item, a larger pro- 
portion of the good supervisors in the Trades 
and Operating group checked “sometimes played 
or worked with parents,” whereas the poor 
checked “often played or worked with parents.” 
The poor supervisors in both groups reported 
that their fathers had punished them more often 
in childhood than had their mothers. 

On other personal items, the good supervisors 
indicated that their parents had been “more 
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happy than the average,” whereas the poor 
checked “about average,” for this item. The good 
supervisors signified that they spent from one 
to five per cent of their salary on personal 
recreation, whereas the poor spent more or less 
than this amount. The high group also indi- 
cated that they had been “rather lucky” in the 
friends they had made, whereas the low group 
had been “neither unlucky nor lucky.” 

In Part IV, “Health,” the poor supervisors 
answered the question “How much exercise do 
you get?’’ with “a considerable amount,” while 
the good supervisors checked “some.” As to 
drinking habits, the good supervisors said that 
they “drink some” while the poor said they 
“drink a little” or “never drink.” 


In general, the differences between the 
responses of the supervisors of different 
types of work groups (Office and Trades 
and Operating) to the items in the Per- 
sonal Data Inventory were not marked. 
On many items the differences were sta- 
tistically significant in one group and not 
in the other, but on most of these items 
there was a trend in the same direction 
in the other group. Therefore, for the 
purpose of predicting leadership success, 
the total sample was considered as a 
whole. The differences in responses to the 
items shown by the upper and lower 
criterion groups (top and bottom 27 per 
cents) were determined for the super- 
visors in Sample A, and a scoring key 
for the inventory was constructed on the 
basis of these differences. The inventories 
for the supervisors in Sample B, a repre- 
sentative random sample of the total 
group tested, were then scored by this 
key. The correlation between these scores 
and the leadership criterion scores was 


E. COMBINING THE TESTS FOR PREDICTING 
LEADERSHIP SUCCESS 


Since the results obtained for super- 
visors of office workers and supervisors 
of trades and operating workers differed 
significantly on the Word Fluency test, 
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it was necessary to compute the multiple 
correlation coefficients for each of these 
two groups separately for best prediction 
of the leadership criterion. For the super- 
visors of office workers the criterion could 
be predicted best by combining the fol- 
lowing three measures: (1) the Social 
Judgment Test, (2) the Word Fluency 
test, and (3) the Personal Data Inventory. 
The multiple correlation was .58. For 
supervisors of manual workers (Trades 
and Operating) the multiple correlation 
between the ratings and two measures, 
the Social Judgment Test and the Per- 
sonal Data Inventory, was .48. 

As a further check on the validity of 
the results, the data were analyzed sepa- 
rately for Sample A (two-thirds of the 
total sample) then cross-checked on 
Sample B. It was felt that this not only 
would indicate any differences in the re- 
sults which might have been obtained 
had the study been run on this sub- 
group only, but also would provide an 
estimate of the degree to which the rela- 
tionship found between the criterion and 
the battery score computed in this sample 
would hold up in an independent, com- 
parable sample.? It was found that the 
* The Personal Data Inventory could not be 


considered in this check in the same manner as 
were the other measures. Since the key for this 
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multiple correlations were highest be- 
tween the criterion and the same tests as 
reported above for the total sample for 
both the Office and Trades and Operat- 
ing groups. 

For supervisors of office workers the 
correlation between the three measures 
combined and the criterion was .6o for 
Sample A. Battery scores were then com- 
puted for the cases in Sample B on the 
basis of the regression equation com- 
puted for Sample A. The correlation 
between this battery score and the cri- 
terion for the cases in Sample B was 
found to be .66. 

The same analysis for the Trades and 
Operating supervisors resulted in a 
multiple correlation between the two 
measures and the criterion of .46 in 
Sample A. In the cross validation on 
Sample B the correlation between the 
battery scores and the criterion was found 
to be .42. | 


inventory was based on differences found in 
Sample A. between extreme groups on the cri- 
terion, to score the inventories in Sample A by 
this key would not be meaningful. An unreal- 
istically high correlation would be found be- 
tween these scores and the criterion scores due 
to capitalization on chance differences. There- 
fore it was felt that the correlation found be- 
tween this measure and the criterion in the 
cross-check on Sample B would provide a more 
realistic figure to assume for Sample A. 
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CHAPTER VI 
DISCUSSION AND IMPLICATIONS OF RESULTS 


A. Discussion 


F THE several tests designed to meas- 
O ure social attitudes, only one 
proved to be related to ratings of leader- 
ship ability. Why did scores on this test 
show a significant relationship to the cri- 
terion, while scores on the other two did 
not? 


This may have been due to the fact that this 
test was less susceptible to the effects of facade 
than were the other two. By this it is not meant 
that a person could necessarily get high scores on 
the other two tests by faking his answers; but 
to the extent to which the testee tries to create 
a particular impression by his answers, his score 
will not give a true indication of his attitudes. 
Any reading which the testee had done as to 
supervisory practices, or training courses in this 
area that he had taken, would affect his score. 
For example, it is generally known that the 
modern tendency is for the supervisor in in- 
dustry to give more attention to the workers’ 
problems than was done in past years. Evidence 
was provided in the results obtained with the 
experimental test which preceded the Social 
Judgment Test that most supervisors know what 
should be done according to recommended prac- 
tices. There was also evidence to show that the 
more intelligent people knew better what should 
be done, than did the less intelligent persons. 
The rather high correlation obtained between 
the Supervisor’s Opinionnaire and the test of 
mental ability (see Table III) may be an indi- 
cation that a similar effect was operating with 
this test. 

The same factor may have been operating to 
reduce the significance of differences found on 
the “Self Description” part of the Personal Data 
Inventory. The supervisors may have been trying 
to determine the most acceptable answers, rather 
than indicating their true feeling. In the section 
dealing with occupations, where the choices were 
less personal in their implication, many more 
significant differences were found. 


Another question that arises is to what 
extent the Social Judgment Test actually 
functioned as a projective test to measure 
differences in social attitudes? 


In the first place, the best evidence for this is 
probably the fact that scores on the test did show 
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a significantly higher relationship to the criterion 
measure of human relations ability in super- 
vision than would be expected by chance. Sec- 
ondly, the test was scored on the basis of judg- 
ments of experts as to the degree to which each 
alternative answer to the items reflected certain 
social attitudes of the person whose behavior was 
predicted. Thirdly, it was shown that the items 
in the test were responded to differently when 
the testee was asked to predict another person’s 
reaction to situations rather than to indicate how 
he would react or what should be done in the 
situations. 

These data do not provide conclusive evidence 
that in making this prediction the testee was 
projecting his own attitude. However, the in- 
direct evidence indicates that this may have been 
the case. Indications to corroborate this were also 
secured through conversations with the testees. It 
was revealed that the same description of a 
person given in the test was interpreted alto- 
gether differently by different supervisors who 
took the test. 

An interesting result which is perhaps of sec- 
ondary importance was the fact that persons who 
scored very high on the test of general mental 
ability scored lower as a group on the Social 
Judgment Test than did those who scored in the 
low and middle ranges. In order to determine 
possible reasons for this an item analysis was 
carried out, comparing the responses on the 
Social Judgment Test of those scoring in the 
upper and lower 15 per cents on the Personnel 
Test. On the basis of this, there seemed to be 
some evidence to show that the high intelligence 
group may have been using some bits of in- 
formation given in the descriptions for predic- 
tion, which the others did not use at all. That is, 
the descriptions may have presented a more 
highly structured picture of the characters for 
the supervisors who scored high in general 
mental ability than they did for the rest of the 
group. In many cases the prediction made from 
such information scored against the testee ac- 
cording to the key. 

However, on many items the differences be- 
tween the high and low intelligence groups did 
not fit the above pattern. That is, on these items 
the high group predicted behavior which re- 
flected social attitudes characteristic of the poor 
supervisors, and there seemed to be no basis at 
all for such a prediction in the description given. 
This suggested that perhaps there was to some 
degree a true curvilinear relationship between 
intelligence and the factor which the test was 
measuring. It might be possible, for example, 
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that it is more difficult for the highly intelligent 
person to see the other person’s point of view. 
He may be more inclined to react to logic than 
to feelings. 


Considering next the Personal Data 
Inventory, the results obtained on this 
provide the basis for many interesting 
hypotheses. 


One such result is the fact that the good 
supervisors had participated much more in all 
athletics except the combative sports, boxing and 
wrestling, in which they had participated less. 
Could this indicate a greater need for the mani- 
festation of aggression by the poor supervisor? 
Another such result which might be interesting 
to the student of personality is the fact that 
more of the poor supervisors indicated that they 
were more fond of their mother than of their 
father, and they also indicated that they had 
been punished more often by their father than 
by their mother. It might be that there is some 
relationship between the quality of leadership 
which one exhibits and the parent with whom 
he identifies. That is, specifically, the good leader 
may tend to identify with his father. 

The responses of the low criterion group to 
the items in the Personal Data Inventory show 
several interesting inconsistencies. For example, 
the results showed that the good leaders partici- 
pated much more in athletics than did the poor, 
yet the amount of exercise which the supervisors 
reported that they got was greater for the poor 
than for the good leaders. Another example of 
this occurred on items referring to drinking 
activities. The poor leaders were much more 
likely to say they never drink, but also much 
more likely to prefer the occupation of “bar- 
tender” than were the good leaders. As a third 
example, the poor supervisors in the Trades and 
Operating group participated much more in 
card playing, pool, and billiards than did the 
high criterion group, yet a much larger pro- 
portion also indicated that they “never make 
wagers.” There was other evidence of a similar 
nature to indicate that the poor supervisors were 
evidently somewhat defensive about admitting 


the possession of any reprehensible character- 


istics. 

An interesting incidental result of this 
study was the fact that scores on the 
Word Fluency test were significantly re- 
lated to the leadership criterion measure 
for the supervisors of office workers. Little 
information is provided by the data to 
indicate possible reasons for this. The 
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test is generally considered to be a meas- 
ure of linguistic ability, since scores on 
the test seem to be related to ability in 
this area. It is the opinion of some psy- 
chologists that the Word Fluency factor 
may be found to be the same as the more 
general fluency factor as defined in per- 
sonality studies. If this is found to be 
true, the results found in this study may 
have interesting implications for the 
study of leadership. 


B. IMPLICATIONS 


The major contribution which this 
study makes is probably the usefulness 
of the results for predicting success in 
leadership situations similar to those 
studied. To what extent are the results 
of this study applicable to other leader- 
ship situations? The supervisors and 
types of work groups led are probably 
fairly representative of those generally 
found in utility companies, Therefore, 
the results are probably most applicable 
to supervisory situations in companies of 
this type. To the extent to which the 
supervisors and types of work groups 
studied are similar to those found in 
other industries, the results of this study 
probably will be applicable to those situ- 
ations also. The fact that the differences 
found between those rated good and 
poor in human relations ability were 
little different for supervisors of either 
office or manual workers implies that the 
results may have some applicability to 
work-group leadership situations in gen- 
eral. 

What implication does this study have 
for the leadership training of supervisors? 
It is commonly conceded that training in 
leadership skills generally changes the 
leader’s performance on the job very 
little. It was found in this study that a 
test which measured the supervisor's 
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knowledge of how particular leadership 
situations should be handled was unsuc- 
cessful. However, a test which measured 
social perception was successful. That is, 
the Social Judgment Test, which meas- 
ured the way in which individuals ex- 
perienced particular stimulus situations 
involving social interaction, did show a 
significant relationship to the supervisor’s 
quality of performance as a leader. 

The way in which a person perceives 
a situation is determined by his attitudes. 
Therefore, this implies that, in order to 
change one’s performance as a supervisor, 
perhaps leadership training should con- 
centrate on changing attitudes, rather 


than attempting to change performance 
by teaching skills. Such an approach to 
leadership training has recently been ad- 
vocated by Maier (8). 

This study is but an exploration into a 
field in which there is a need for objecti- 
fying a great deal of opinion. The study 
should stimulate the investigation of the 
relationship between supervisory leader- 
ship success and other personality char- 
acteristics. As an immediate next step, it 
is proposed that it should be determined 
to what extent some of the differences 
found in this study are common to other 
leadership situations. 


r 
y | 
: 
n 
d 
y 
y 
e€ 
yf 
4 
IS 
n 3 
y 
l- 
: 
d 
= 
e 
3? | 
y | 
a 
| 


HE general purpose of this study was 
beer identify certain measurable char- 
acteristics related to success in the human 
relations aspect of work-group leader- 
ship. The study was carried out in a 
large utility company. Several tests were 
administered to, and other quantifiable 
data were obtained from, approximately 
two hundred first-line supervisors. These 
were supervisors of various types of work- 
groups, including supervisors of office 
workers (such as of typists, clerks, ac- 
countants) and supervisors of manual 
workers (such as of power plant workers, 
linemen, repairmen, and the like). 

In addition, detailed ratings of the 
supervisors’ on-the-job performance were 
obtained with a rating scale of the check- 
list type. This provided a detailed de- 
scription, from which an objective score 
could be derived, of the way in which 
each supervisor handled the human rela- 
tions aspect of his job. The data were 
analyzed for supervisors of office work- 
ers and supervisors of manual workers 
considered separately, as well as for the 
sample as a whole. 

Whereas a test to measure knowledge 
of leadership skills was unsuccessful, a 
test designed to measure one’s social atti- 
tudes was found to correlate significantly 
with the criterion measure of supervisory 
success. This relationship was found to 
be equally significant for supervisors of 
the two different types of work groups. 
The test was designed to function as a 
projective test. The testee was asked 
to predict the behavior of other persons, 
who were described briefly, in certain 
interpersonal situations. It was felt that 
in making this prediction the testee 
would reveal his true attitude toward 
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CuaptTer VII 


SUMMARY 


other persons through projection. Evi- 
dence was presented to indicate that the 
good supervisor regards others as indi- 
viduals with motives, feelings, and goals 
of their own, whereas the poor super- 
visor is more likely to perceive others in 
relation to his own motives or goals. 
An inventory for obtaining informa- 

tion concerning one’s interests and back- 
ground or personal history was also ad- 
ministered to the supervisors. Many 
more differences which attained at least 
the 5 per cent level of significance were 
found between “good” and “poor” super- 
visors on interest items than on personal 
history items. For example, the good su- 
pervisors had participated more in most 
athletic sports than had the poor, accord- 
ing to their reports. Some tendency was 
evident for more of the good supervisors 
to prefer occupations which are general- 

ly assumed to require ability in dealing 

with people. The parents of good super- 

visors were reported to have had more 

education and to have been happier than 

those of the poor supervisors. Differences 
were found on many other items of a 
similar nature. 

A significant relationship was found 
between scores on the Word Fluency test 
and the criterion measure of success in 
human relations for supervisors of office 
employees; but no such relationship was 
found for supervisors of manual work- 
ers. 

It was concluded that, although the 
major contribution which the study 
makes is probably the usefulness of the 
results for predicting supervisory success, 
the study also has important implications 
for training leaders. That is, since a test 
to measure knowledge of leadership skills 
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was unsuccessful, while scores on a test 
designed to measure one’s social atti- 
tudes did correlate with quality of leader- 
ship, perhaps leadership training should 


29 


concentrate on changing attitudes, rather 


than attempting to change performance 
by teaching skills. 
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